-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix rendering dbt tests with multiple parents #1433
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
Deploying astronomer-cosmos with Cloudflare Pages
|
tatiana
changed the title
WIP: Fix rendering DAGs with tests with multiple parents (
WIP: Fix rendering DbtDags with tests with multiple parents (Dec 27, 2024
AFTER_EACH
& BUILD
)AFTER_EACH
& BUILD
)
tatiana
changed the title
WIP: Fix rendering DbtDags with tests with multiple parents (
WIP: Fix rendering tests with multiple parents (Dec 27, 2024
AFTER_EACH
& BUILD
)AFTER_EACH
& BUILD
)
tatiana
changed the title
WIP: Fix rendering tests with multiple parents (
WIP: Fix rendering dbt tests with multiple parents (Dec 27, 2024
AFTER_EACH
& BUILD
)AFTER_EACH
& BUILD
)
tatiana
changed the title
WIP: Fix rendering dbt tests with multiple parents (
WIP: Fix rendering dbt tests with multiple parents
Dec 27, 2024
AFTER_EACH
& BUILD
)
tatiana
changed the title
WIP: Fix rendering dbt tests with multiple parents
Fix rendering dbt tests with multiple parents
Dec 27, 2024
dosubot
bot
added
the
size:L
This PR changes 100-499 lines, ignoring generated files.
label
Dec 27, 2024
dosubot
bot
added
area:rendering
Related to rendering, like Jinja, Airflow tasks, etc
dbt:test
Primarily related to dbt test command or functionality
labels
Dec 27, 2024
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1433 +/- ##
==========================================
+ Coverage 96.92% 96.94% +0.02%
==========================================
Files 73 73
Lines 4320 4350 +30
==========================================
+ Hits 4187 4217 +30
Misses 133 133 ☔ View full report in Codecov by Sentry. |
…AG with multiple tests
tatiana
force-pushed
the
tests-multiple-parents
branch
from
December 27, 2024 17:11
1111195
to
1c97e32
Compare
pankajkoti
approved these changes
Dec 30, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice support added.
pankajastro
approved these changes
Dec 30, 2024
Merged
pankajastro
added a commit
that referenced
this pull request
Dec 30, 2024
1.8.1 (2024-12-30) -------------------- Bug Fixes * Fix rendering dbt tests with multiple parents by @tatiana in #1433 * Add ``kwargs`` param in DocsOperator method ``upload_to_cloud_storage`` by @pankajastro in #1422 Docs * Improve OpenLineage documentation by @tatiana in #1431 Others * Enable Docs DAG in CI leveraging existing CI connections by @pankajkoti in #1428 * Install providers with airflow by @pankajkoti in #1432 * Remove unused docs dependency by @pankajastro in #1414 * Pre-commit hook updates in #1424 --------- Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com> Co-authored-by: Pankaj Koti <pankajkoti699@gmail.com>
1 task
tatiana
added a commit
that referenced
this pull request
Jan 14, 2025
As part of #1433 (Cosmos 1.8.1), we introduced a bug: detached test nodes were being incorrectly displayed when using: - `TestBehavior.NONE` (when no tests should be run) - `TestBehavior.AFTER_ALL` (when all tests should be run by the end of the DAG, regardless of their type) This PR solves this issue. Closes: #1444
tatiana
added a commit
that referenced
this pull request
Jan 14, 2025
As part of #1433 (Cosmos 1.8.1), we introduced a bug: detached test nodes were being incorrectly displayed when using: - `TestBehavior.NONE` (when no tests should be run) - `TestBehavior.AFTER_ALL` (when all tests should be run by the end of the DAG, regardless of their type) This PR solves this issue. Closes: #1444
tatiana
added a commit
that referenced
this pull request
Jan 14, 2025
Since we introduced detached test tasks in #1433 (released in 1.8.0) users started facing issues due to very long task names exceeding Airflow's limits. Example of stacktrace reported by user: ``` Traceback (most recent call last): File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 968, in __init__ validate_key(task_id) File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/helpers.py", line 55, in validate_key raise AirflowException(f"The key has to be less than {max_length} characters") airflow.exceptions.AirflowException: The key has to be less than 250 characters ``` This PR fixes this issue. In case the name exceeds Airflow's limit (250 ATM), it will name the detached test using: - "detached_{incremental unique number}_test" Closes: #1440
tatiana
added a commit
that referenced
this pull request
Jan 15, 2025
…1463) As part of #1433 (Cosmos 1.8.1), we introduced a bug: detached test nodes were being incorrectly displayed when using: - `TestBehavior.NONE` (when no tests should be run) - `TestBehavior.AFTER_ALL` (when all tests should be run by the end of the DAG, regardless of their type) This PR solves this issue. Closes: #1444
tatiana
added a commit
that referenced
this pull request
Jan 15, 2025
Since we introduced detached test tasks in #1433 (released in 1.8.0) users started facing issues due to very long task names exceeding Airflow's limits. Example of stacktrace reported by user: ``` Traceback (most recent call last): File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 968, in __init__ validate_key(task_id) File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/helpers.py", line 55, in validate_key raise AirflowException(f"The key has to be less than {max_length} characters") airflow.exceptions.AirflowException: The key has to be less than 250 characters ``` This PR fixes this issue. In case the name exceeds Airflow's limit (250 ATM), it will name the detached test using: - "detached_{incremental unique number}_test" Closes: #1440
tatiana
added a commit
that referenced
this pull request
Jan 15, 2025
Since we introduced detached test tasks in #1433 (released in 1.8.0) users started facing issues due to very long task names exceeding Airflow's limits. Example of stacktrace reported by user: ``` Traceback (most recent call last): File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 968, in __init__ validate_key(task_id) File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/helpers.py", line 55, in validate_key raise AirflowException(f"The key has to be less than {max_length} characters") airflow.exceptions.AirflowException: The key has to be less than 250 characters ``` This PR fixes this issue. In case the name exceeds Airflow's limit (250 ATM), it will name the detached test using: - "detached_{incremental unique number}_test" Closes: #1440
tatiana
added a commit
that referenced
this pull request
Jan 15, 2025
Since we introduced detached test tasks in #1433 (released in 1.8.0), users started facing issues due to very long task names exceeding Airflow's limits. Example of Python traceback reported by user: ``` Traceback (most recent call last): File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 968, in __init__ validate_key(task_id) File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/helpers.py", line 55, in validate_key raise AirflowException(f"The key has to be less than {max_length} characters") airflow.exceptions.AirflowException: The key has to be less than 250 characters ``` This PR fixes this issue. In case the name exceeds Airflow's limit (250 ATM), it will name the detached test using: - "detached_{incremental unique number}_test" We also considered naming the new test using: - "parent1_parent2_..._test" - but that may not solve the issue, especially in circumstances where the same parents may have multiple detached nodes. Perhaps in future, we could bundle them in a single task. Closes: #1440
tatiana
added a commit
that referenced
this pull request
Jan 15, 2025
Since we introduced detached test tasks to fix a customer issue in #1433 (release 1.8.1), we changed how Cosmos renders DAGs, with the chance that Cosmos significantly changed how it renders a dbt project - even when users did not change their `DbtDag` or `DbtTaskGroup` configuration. This is unacceptable in a micro release - and for this reason we're reverting this change and making the feature opt-in. PR #1433 led to issues such as #1464, reported by multiple Cosmos users, and also issues that did not become Github issues, such as an Astro customer who reported that when they upgraded to Cosmos 1.8.1, the number of tasks increased dramatically. One problem with #1433 was that it did not empower users to opt in or out of having detached test nodes, solving the problem for some but causing problems for many. This PR aims to solve this problem by introducing a new property to `RenderConfig`: `should_detach_multiple_parents_tests`. We are reverting the Cosmos DAG rendering to what it was in 1.8.0 and before: by default, it will not detach tests with multiple parents. Users must opt-in for this behaviour if and when they want to. We understand this may be perceived as a breaking change by some, but it is the correct way to move forward and avoid causing further disruption. We are planning to review the current implementation, as described in #1469. For now, this PR documents the current behaviour and empowers users.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:rendering
Related to rendering, like Jinja, Airflow tasks, etc
dbt:test
Primarily related to dbt test command or functionality
size:L
This PR changes 100-499 lines, ignoring generated files.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If these two circumstances are met:
DbtDag
orDbtTaskGroup
useTestBehavior.AFTER_EACH
(default) orTestBehavior.BUILD
Cosmos 1.8.0 and previous versions would attempt to run the same test multiple times after each parent model run, likely failing if any of the parents hadn't been run yet.
This PR aims to fix this behaviour by not running tests with multiple dependencies within each task group / build task - and by adding those tests to run only once and after all parents have run.
Related issues
Closes: #978
Closes: #1365
This change also sets the ground for adding support to tests that don't have any dependencies, a problem discussed in the following tickets:
How to reproduce
There are two steps to reproduce this problem:
DbtDag
that uses this dbt project to reproduce the original problemRepresentative dbt project
We created a dbt project named
multiple_parents_test
that has a test calledcustom_test_combined_model
that depends on two models:The expectation from a user perspective is that, since the
combined_model
depends onmodel_a
, that themultiple_parents_test
will only be run after both models were run, once.Definitions of the test:
By running the following
dbt build
command, we confirm that the test depends on both models:This is what the pipeline topology looks like:
The source code structure for this dbt project:
When running
dbt ls
, it displays:Behavior in Cosmos
The DAG
example_multiple_parents_test
uses this new dbt project:When trying to run it using:
Users face the original error because the test is being attempted to be run after
model_a
was run but beforecombined_model
is run:Excerpt from the logs of the failing task:
Behaviour after this change
With this change, when running the DAG mentioned above, it results in:
And it can successfully be run.
Breaking Change?
This PR slightly changes the behaviour of Cosmos DAG rendering when using
TestBeahavior.AFTER_EACH
orTestBeahavior.BUILD
when there are tests with multiple parents. Some may consider it a breaking change, but a bug fix is a better classification since Cosmos did not support rendering many dbt projects that met these circumstances.The behaviour change in those cases is that we're isolating tests that depend on multiple parents and running them outside of the
TestBehaviour.AFTER_EACH
dbt node Cosmos TaskGroup orTestBehaviour.BUILD
.This change will likely highlight any tests that depended on multiple models and were not failing previously but running as part of the tests of both models.